event-based vision
EREBUS: End-to-end Robust Event Based Underwater Simulation
Kyatham, Hitesh, Suresh, Arjun, Palnitkar, Aadi, Aloimonos, Yiannis
Abstract--The underwater domain presents a vast array of challenges for roboticists and computer vision researchers alike, such as poor lighting conditions and high dynamic range scenes. In these adverse conditions, traditional vision techniques struggle to adapt and lead to suboptimal performance. Event-based cameras present an attractive solution to this problem, mitigating the issues of traditional cameras by tracking changes in the footage on a frame-by-frame basis. In this paper, we introduce a pipeline which can be used to generate realistic synthetic data of an event-based camera mounted to an AUV (Autonomous Underwater V ehicle) in an underwater environment for training vision models. We demonstrate the effectiveness of our pipeline using the task of rock detection with poor visibility and suspended particulate matter, but the approach can be generalized to other underwater tasks.
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
STREAM: A Universal State-Space Model for Sparse Geometric Data
Schöne, Mark, Bhisikar, Yash, Bania, Karan, Nazeer, Khaleelulla Khan, Mayr, Christian, Subramoney, Anand, Kappel, David
Handling sparse and unstructured geometric data, such as point clouds or event-based vision, is a pressing challenge in the field of machine vision. Recently, sequence models such as Transformers and state-space models entered the domain of geometric data. These methods require specialized preprocessing to create a sequential view of a set of points. Furthermore, prior works involving sequence models iterate geometric data with either uniform or learned step sizes, implicitly relying on the model to infer the underlying geometric structure. In this work, we propose to encode geometric structure explicitly into the parameterization of a state-space model. State-space models are based on linear dynamics governed by a one-dimensional variable such as time or a spatial coordinate. We exploit this dynamic variable to inject relative differences of coordinates into the step size of the state-space model. The resulting geometric operation computes interactions between all pairs of N points in O(N) steps. Our model deploys the Mamba selective state-space model with a modified CUDA kernel to efficiently map sparse geometric data to modern hardware. The resulting sequence model, which we call STREAM, achieves competitive results on a range of benchmarks from point-cloud classification to event-based vision and audio classification. STREAM demonstrates a powerful inductive bias for sparse geometric data by improving the PointMamba baseline when trained from scratch on the ModelNet40 and ScanObjectNN point cloud analysis datasets. It further achieves, for the first time, 100% test accuracy on all 11 classes of the DVS128 Gestures dataset.
- Europe > Germany (0.28)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
State Space Models for Event Cameras
Zubić, Nikola, Gehrig, Mathias, Scaramuzza, Davide
Today, state-of-the-art deep neural networks that process event-camera data first convert a temporal window of events into dense, grid-like input representations. As such, they exhibit poor generalizability when deployed at higher inference frequencies (i.e., smaller temporal windows) than the ones they were trained on. We address this challenge by introducing state-space models (SSMs) with learnable timescale parameters to event-based vision. This design adapts to varying frequencies without the need to retrain the network at different frequencies. Additionally, we investigate two strategies to counteract aliasing effects when deploying the model at higher frequencies. We comprehensively evaluate our approach against existing methods based on RNN and Transformer architectures across various benchmarks, including Gen1 and 1 Mpx event camera datasets. Our results demonstrate that SSM-based models train 33% faster and also exhibit minimal performance degradation when tested at higher frequencies than the training input. Traditional RNN and Transformer models exhibit performance drops of more than 20 mAP, with SSMs having a drop of 3.76 mAP, highlighting the effectiveness of SSMs in event-based vision tasks.
- Research Report > New Finding (1.00)
- Research Report > Promising Solution (0.67)
GET: Group Event Transformer for Event-Based Vision
Peng, Yansong, Zhang, Yueyi, Xiong, Zhiwei, Sun, Xiaoyan, Wu, Feng
Event cameras are a type of novel neuromorphic sen-sor that has been gaining increasing attention. Existing event-based backbones mainly rely on image-based designs to extract spatial information within the image transformed from events, overlooking important event properties like time and polarity. To address this issue, we propose a novel Group-based vision Transformer backbone for Event-based vision, called Group Event Transformer (GET), which de-couples temporal-polarity information from spatial infor-mation throughout the feature extraction process. Specifi-cally, we first propose a new event representation for GET, named Group Token, which groups asynchronous events based on their timestamps and polarities. Then, GET ap-plies the Event Dual Self-Attention block, and Group Token Aggregation module to facilitate effective feature commu-nication and integration in both the spatial and temporal-polarity domains. After that, GET can be integrated with different downstream tasks by connecting it with vari-ous heads. We evaluate our method on four event-based classification datasets (Cifar10-DVS, N-MNIST, N-CARS, and DVS128Gesture) and two event-based object detection datasets (1Mpx and Gen1), and the results demonstrate that GET outperforms other state-of-the-art methods. The code is available at https://github.com/Peterande/GET-Group-Event-Transformer.
What is Event-based Vision?
PROPHESEE creates both neuromorphic sensors and bio-inspired algorithms that function like the eye and brain. This holistic approach is a fundamental shift in computer vision – the departure from frame-based sensors, to event-based vision systems, also known as event cameras. Each pixel only reports when it senses movement. Whereas in a frame-based sensor all pixels record at the same time, in an event-based sensor each pixel is perfectly independent.
Moving Object Detection for Event-based vision using Graph Spectral Clustering
Mondal, Anindya, R, Shashant, Giraldo, Jhony H., Bouwmans, Thierry, Chowdhury, Ananda S.
Moving object detection has been a central topic of discussion in computer vision for its wide range of applications like in self-driving cars, video surveillance, security, and enforcement. Neuromorphic Vision Sensors (NVS) are bio-inspired sensors that mimic the working of the human eye. Unlike conventional frame-based cameras, these sensors capture a stream of asynchronous 'events' that pose multiple advantages over the former, like high dynamic range, low latency, low power consumption, and reduced motion blur. However, these advantages come at a high cost, as the event camera data typically contains more noise and has low resolution. Moreover, as event-based cameras can only capture the relative changes in brightness of a scene, event data do not contain usual visual information (like texture and color) as available in video data from normal cameras. So, moving object detection in event-based cameras becomes an extremely challenging task. In this paper, we present an unsupervised Graph Spectral Clustering technique for Moving Object Detection in Event-based data (GSCEventMOD). We additionally show how the optimum number of moving objects can be automatically determined. Experimental comparisons on publicly available datasets show that the proposed GSCEventMOD algorithm outperforms a number of state-of-the-art techniques by a maximum margin of 30%.
Moving Object Detection for Event-based Vision using k-means Clustering
Mondal, Anindya, Das, Mayukhmali
Event-based cameras are bio-inspired sensors that mimic the working of the human eye (Gallego et al. [2020]). While frame-based cameras capture images at a definite frame rate which is determined by an external clock, each pixel in event-based cameras memorizes the log intensity each time an event is sent and simultaneously monitors for a sufficient change in magnitude from this memorized threshold value (Gallego et al. [2020]). The event is recorded by the camera and is transmitted by the sensor in the form of its location {x, y}, its time of occurrence (timestamp) t and its polarity p (taking a binary value 1 or 1, representing whether the pixel is brighter or darker) (Chen et al. [2020]). The working of an event-based camera is shown in Figure 1. The sensors used in event-based cameras are data-driven, for their output depends on the amount of motion or brightness change in the scene (Gallego et al. [2020]). Higher is the motion, higher is the number of events generated. The events are recorded in microsecond resolution and are transmitted in sub-millisecond latency, making these sensors react quickly to visual stimuli (Gallego et al. [2020]). Thus, while frame-based cameras capture the absolute brightness of a scene, event-based cameras capture the per-pixel brightness asynchronously, making traditional computer vision algorithms inapplicable to be implemented for processing the event data. Detection of moving objects is an important task in automation, where a computer differentiates in between a moving object and a stationary one.
- Asia > India > West Bengal > Kolkata (0.04)
- North America > Canada > Ontario > Essex County > Windsor (0.04)
- Asia > Taiwan (0.04)
EETimes - Bringing Neuromorphic Vision to the Edge
Neuromorphic vision sensing has demonstrated impressive results for its use in machine vision applications and is gaining broader commercial traction. Several companies have introduced multiple generations of neuromorphic visions sensors and a true ecosystem has evolved. Developers recognize its data-efficient approach to sensing and acquisition. As a result, we expect to see a greater proliferation of its use thanks to its ability to significantly improve the functionality, performance and power consumption for machine vision-enabled systems. At its essence, neuromorphic sensing applies techniques drawn from neurobiology that mimic the efficiency and adaptability to change of the human brain, eye and connected systems.
- Information Technology (0.71)
- Semiconductors & Electronics (0.50)